Prom mechanical folding trajectories to intrinsic energy landscapes of biopolymers 
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In single molecule laser optical tweezer (LOT) pulling experiments a protein or RNA is juxtaposed 
between DNA handles that are attached to beads in optical traps. The LOT generates folding 
trajectories under force in terms of time-dependent changes in the distance between the beads. 
How to construct the full intrinsic folding landscape (without the handles and the beads) from the 
measured time series is a major unsolved problem. By using rigorous theoretical methods — which 
account for fluctuations of the DNA handles, rotation of the optical beads, variations in applied 
tension due to finite trap stiffness, as well as environmental noise and the limited bandwidth of the 
apparatus — we provide a tractable method to derive intrinsic free energy profiles. We validate the 
method by showing that the exactly calculable intrinsic free energy profile for a Generalized Rouse 
Model, which mimics the two-state behavior in nucleic acid hairpins, can be accurately extracted 
from simulated time series in a LOT setup regardless of the stiffness of the handles. We next apply 
the approach to trajectories from coarse grained LOT molecular simulations of a coiled-coil protein 
based on the GCN4 leucine zipper, and obtain a free energy landscape that is in quantitative 
agreement with simulations performed without the beads and handles. Finally, we extract the 
intrinsic free energy landscape from experimental LOT measurements for the leucine zipper, which 
is independent of the trap parameters. 



The energy landscape perspective has provided a con- 
ceptual framework to describe how RNA [1] and pro- 
teins [2-4] fold. Some of the key theoretical predic- 
tions, such as folding of proteins and RNA by the ki- 
netic partitioning mechanism [5] and the diversity of fold- 
ing routes [6], have been confirmed by a number of ex- 
periments [7]. More refined comparisons require map- 
ping the full folding landscape of biomolecules, which has 
been difficult to achieve. The situation has dramatically 
changed with advances in laser optical tweezer (LOT) 
experiments, which have been used to obtain free energy 
profiles as a function of the extension of biomolecules 
under tension [7-10, 12]. 

The usefulness of the LOT technique, however, hinges 
on the crucial assumption that information about the 
fluctuating biomolecule can be accurately recovered from 
the raw experimental data, namely the time-dependent 
changes in the positions of the beads in the optical traps, 
attached to the biomolecule by double-stranded DNA 
handles [Fig. 1] . Thus, we only have access to the intrin- 
sic folding landscape of the biomolecule (in the absence 
of handles and beads) indirectly through the bead-bead 
separation along the force direction. Many extraneous 
factors, such as fluctuations of the handles [13, 14], ro- 
tation of the beads, and the varying applied tension due 
to finite trap stiffness, can severely distort the intrinsic 
folding landscape. Moreover, the detectors and electronic 
systems used in the data collection have finite response 
times, leading to filtering of high frequency components 
in the signal [18]. Ad hoc attempts have been made to ac- 
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FIG. 1. Dual beam optical tweezer setup for studying the 
equilibrium folding landscape of a single protein molecule un- 
der force. 



count for handle effects based on experimental estimates 
of stretched DNA properties, employing techniques sim- 
ilar to image deconvolution [8, 10, 22]. Theory has been 
used to extract free energy information from nonequilib- 
rium pulling experiments [17], and to determine the in- 
trinsic power spectrum of protein fluctuations [18] from 
LOT data. However, to date there has been no compre- 
hensive theory to model and correct for all the systematic 
instrumental distortions of the underlying folding land- 
scapes of proteins and RNA. 

A crucial unsolved problem is how can one construct 
the intrinsic free energy profile of a biomolecule using the 
measured folding trajectories in the presence of beads 
and handles (the total separation Ztot(i) in Fig- 1 as a 
function of time t). Here, we solve this problem using a 
rigorous theoretical procedure. Besides z to t{t), the only 
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input needed in our theory are the bead radii, the trap 
strengths and positions, and handle characteristics such 
as the contour length, the persistence length, and the 
elastic stretch modulus. The output is the intrinsic free 
energy as a function of the biomolecular extension (z p in 
Fig. 1) in the constant force ensemble. 

We validate our approach using two systems: (i) a gen- 
eralized Rouse model (GRM) hairpin [6], which has an 
analytically solvable double-well energy landscape under 
force; this allows a direct test of the method; (ii) a double- 
stranded coiled-coil protein based on the yeast transcrip- 
tional factor GCN4 leucine zipper domain, whose folding 
landscape was studied using a LOT experiment [10]. We 
first use coarse-grained molecular simulations to obtain 
the intrinsic free energy landscape of the isolated pro- 
tein at a constant force. We then simulate mechanical 
folding trajectories using the full LOT setup, from which 
we quantitatively recover the intrinsic free energy land- 
scape of GCN4, thus further establishing the efficacy of 
our theory. Finally, we apply our theory to experimen- 
tally generated data, and show that we can get reliable 
estimates for the protein energy profile independent of 
the optical trap parameters. 



I. RESULTS 

A. Theory for constructing the intrinsic protein 
folding landscape from measurements 

In a dual beam optical tweezer setup (Fig. 1) the 
protein is covalently connected to double-stranded DNA 
handles that are attached to glass or polystyrene beads in 
two optical traps. For small displacements of the beads 
from the trap centers [1], the trap potentials are har- 
monic, with strengths k x = k z = fctrap along the lateral 
plane, and a weaker axial strength k y = ak tra , p , where 
a < 1 [2]. For simplicity, we take both traps to have 
equal strengths, though our method can be generalized 
to an asymmetric setup. The trap centers are separated 
from each other along the z axis, with trap 1 at z = 
and trap 2 at z — z trap . As the bead-handle-protein 
(bhp) system fluctuates in equilibrium, the positions of 
the bead centers, ri(i) and r 2 (t), vary in time. We as- 
sume that the experimentalist can collect a time series 
of the z components of the bead positions, z\(t) and 
z 2 (t). Denote the mean of each time series as z\ and 
z 2 . We assume that the trap centers are sufficiently far 
apart that the whole system is under tension, which im- 
plies that the mean bead displacements are non-zero, 
z\ = z trap — z 2 — F/ktia P > 0, where F is the mean 
tension along z. We focus on the case where there is no 
feedback mechanism to maintain a constant force, so the 
instantaneous tension in the system changes as the total 
end-to-end extension component z to t(t) = z 2 (t) — zi(t) 
[Fig. 1] varies. Though we choose one particular pas- 
sive setup, the theory can be adapted to other types of 
passive optical tweezer systems [1, 8], where the force 



is approximately constant (in which case we could skip 
the transformation into the constant-force ensemble de- 
scribed below). The mean tension F, a measure of the 
overall force scale, can be tuned at the start of the experi- 
ment by making the trap separation z trap larger (leading 
to higher F) or smaller (leading to lower F). Because 
F = fctrap(- z tra P — z tot )/2, the precise relationship between 
^trap and F requires knowing the mean total extension 
z to t , which depends among other things on the details of 
the energy landscape. Hence, we cannot in general cal- 
culate beforehand what F will be for a given z trap . How- 
ever, one of the advantages of our approach is that we 
can combine data from different experimental runs (each 
having a different z trap and F) to accurately construct 
the protein free energy profile. This combination is car- 
ried out through the weighted histogram analysis method 
(WHAM) [7] (sec Supplementary Information (SI) for de- 
tails) , in a spirit similar to earlier work in the context of 
optical tweezers [8, 9]. We first solve the problem of ob- 
taining the protein landscape based on a single observed 
trajectory of bead-to-bead separations specified as z to t as 
a function of t. 

The key quantity in the construction procedure is 
"Ptot^tot), the equilibrium probability distribution of z% Q t 
within the external trap potential, which can be directly 
derived from the experimental time series. The imperfect 
nature of the measured data, due to noise and low-pass 
filtering effects in the recording apparatus, will distort 
"Ptot^tot ), but we have developed a technique to model 
and approximately correct for these issues (see Finite 
Bandwidth Scaling (FBS) in the Methods). Once we have 
an experimental estimate for T-tot^tot), the objective is 
to find V p (z p ;F ), the intrinsic distribution of the pro- 
tein end-to-end extension component z p at some constant 
force F , whose value we are free to choose. (We will 
use tilde notation to denote probabilities in the constant- 
force ensemble.) The intrinsic protein free energy profile 
is F p {z pi F a ) = — ksT In V p (z p ; F ). The procedure, ob- 
tained from rigorous theoretical underpinnings described 
in detail in the SI, consists of two steps: 

1. Transformation into the constant-force ensemble. 
Given 'Ptot( 2; tot), we obtain the total system end- 
to-end distribution at a constant F using, 

Vtot(ztou F ) 

= C -1 e (SF z tot + 1 Pk trap ( Ztrap -z tot ) 2 ^ ( ZtQt ^ 

where (3 — 1/ksT and C is a normalization con- 
stant. The equation above applies in the case of a 
single experimental trajectory at a particular trap 
separation z trap . 

2. Extraction of the intrinsic protein distribution. In 
the constant-force ensemble, V tot = V h * V h * V p * 
Vh * Vh, relates the total end-to-end fluctuations 
T^tot^tot! ^o) to the end-to-end distributions for the 
individual components V a (z a ', Fq), where a denotes 
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bead (b), handle (h), or protein (p), and * is a ID 
convolution operator. For the beads, "end-to-end" 
refers to the extension between the bead center and 
the handle attachment point, projected along z. In 
Fourier space the convolution has the form: 

Aot(k; F ) = P b 2 (fc; F )P£(k; F )P p (fc; F ) 
= V bh (k;F Q )V p (k;F ), 

where V a {k; Fq) is the Fourier transform of 
V a (z a ; F ). Here Pbh, which is the result of con- 
volving all the bead and handle distributions, acts 
as the main point spread function relating the in- 
trinsic protein distribution P p to Ptot- Since Pbh 
can be modeled from a theoretical description of 
the handles and beads, we can solve for P p using 
Eq. (2) and hence find F p , the intrinsic free energy 
profile of the protein. 

The derivation of the procedure (given in the SI, along 
with technical aspects of its numerical implementation) 
shows the conditions under which the two step method 
works. The mathematical approximation underlying step 
1 becomes exact if either of the following hold: (i) 
k x = k y = 0; (ii) the full 3D total system end-to-end 
probability is separable into a product of distributions 
for longitudinal (z) and transverse (x, y) components. In 
general, condition (ii) is not physically sensible [6]. How- 
ever, if p tot is the typical length scale describing trans- 
verse fluctuations, then condition (i) is approximately 
valid when (3k tlap p 2 ot ^ 1- If this condition breaks down, 
accurate construction of the intrinsic energy landscape 
cannot be performed without knowledge of the transverse 
behavior. However, in the simulation and experimental 
results below, the force scales are such that transverse 
fluctuations are small, p to t ~ 0(1 nm), so to ensure con- 
dition (i) is met, we require that fc trap "C ksT/p 2 ot = 4.1 
pN/nm at T = 298 K. We use the experimental value 
^trap = 0.25 pN/nm in our test cases [10], which is well 
under the upper limit. In principle, one can choose any 
F , the force value of the constant force ensemble where 
we carry out the analysis. In practice, F should be cho- 
sen from among the range of forces that is sampled in 
equilibrium during the actual experiment, since this will 
minimize statistical errors in the final constructed land- 
scape. For example, setting F — F, the mean tension, 
is a reasonable choice. 

Step 2 depends on knowledge of ~Pbh(k;Fo), and thus 
the individual constant-force distributions of the beads 
and the handles in Fourier space. The point spread func- 
tion is characterized by: the bead radius the han- 
dle contour length L, the handle persistence length l p , 
and the handle elastic stretching modulus 7. In P h we 
also include the covalent linkers which attach the han- 
dles to the beads and protein. If we model these link- 
ers as short, stiff harmonic springs, we have two ad- 
ditional parameters: the linker stiffness n and natural 
length I. Using the extensible semiflexible chain as a 



model for the handles, we exploit an exact mapping be- 
tween this model and the propagator for the motion of 
a quantum particle on the surface of a unit sphere [4] to 
calculate the handle Fourier-space distribution to arbi- 
trary numerical precision. Together with analytical re- 
sults for the bead and linker distributions, we can thus 
directly solve for Pbh(k;F ). To verify that the analyt- 
ical model for the point-spread function can accurately 
describe handle/bead fluctuations over a range of forces, 
we have analyzed data from control experiments on a 
system involving only dsDNA handles attached to beads, 
where Ptot = Pbh (SI). The theory simultaneously fits re- 
sults for several experimental quantities measured on the 
same system: the distributions Pbh derived from three 
different trap separations, corresponding to mean forces 
Fo = 9.4 — 12.7 pN, and a force-extension curve. The 
accuracy of the model Pbh is w 1 — 3%, within the ex- 
perimental error margins. 



B. Robustness of the theory validated by 
application to an exactly soluble model 

We first apply the theory to a problem for which the 
intrinsic free energy profiles at arbitrary force are known 
exactly. The generalized Rouse model (GRM) hairpin 
(see SI for details) is a two-state folder whose full 3D equi- 
librium end-to-end distributions are analytically solvable. 
A representative GRM distribution Pqrm at Fo = 11.9 
pN is plotted in Fig. 2(a). Since Pqrm is cylindrically 
symmetric, the top panel shows a projection onto the 
(p = \J x 1 + y 2 , z) plane, while the bottom panel shows 
the further projection onto the z coordinate. The two 
peaks correspond to the native (N) state at small z, and 
the unfolded (U) state at large z. In order to model the 
optical tweezer system, we add handles and beads to the 
GRM hairpin, whose probabilities Ph and Pb (including 
transverse fluctuations) are illustrated in Fig. 2(b) and 
(c) . The full 3D behavior is derived in an analogous man- 
ner to the theory mentioned above for the ID Fourier- 
space distribution Vbh(k;F ) 01 the beads/handles; the 
only difference is that the transverse degrees of freedom 
are not integrated out. The 3D convolution of the sys- 
tem components, plus the optical trap contribution, gives 
the total distribution P tot in Fig. 2(d). The bead, han- 
dle, linker, and trap parameters are listed in SI Table SI. 
From Ptot one can calculate the mean total z extension 
and the mean tension, which in this case are Ztot = 1199 
nm, F = fc trap (ztrap - z tot )/2 = 11.9 pN. 

The i probability projection in the bottom panel of (d) 
is the information accessible in an experiment, and the 
computation of the intrinsic distribution in the bottom 
panel of (a) is the ultimate goal of the construction proce- 
dure. Comparing (a) and (d), two effects of the apparatus 
are visible: the GRM peaks have been partially blurred 
into each other, and the transverse (p) fluctuations have 
been enhanced. The handles provide the dominant con- 
tribution to both these effects. 
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FIG. 2. Generalized Rouse model (GRM) hairpin in an optical tweezer setup. The first row shows the exact end-to-end 
distributions along z for each component type in the system: a) GRM, b) dsDNA handle, c) polystyrene bead. The handle, 
bead and trap parameters are listed in Table SI (GRM column). Upper panels show the probabilities projected onto cylindrical 
coordinates (p = \f x 2 + y 2 , z), while the lower ones show the projection onto z alone, (d) The result for the total system 
end-to-end distribution, Vtot, derived by convolving the component probabilities and accounting for the optical traps, (e-g) 
The construction of the original GRM distribution "Pgrm starting from Vtot- (e) Vtot (purple) and Vtot (blue) as a function of 
z on the bottom axis, measured relative to z, the average extension for each distribution. For Vtot, the upper axis shows the z 
range translated into the corresponding trap forces F. After removing the trap effects, Vtot is the distribution for constant force 
Fq = 11.9 pN. (f) Fbh, describing the total probability at Fq of fluctuations resulting from both handles and the rotation of the 
beads, (g) The constructed solution for Pgrm (solid line), obtained by numerically inverting the convolution Vtot = Fbh *Fgrm • 
The exact analytical result for Pgrm is shown as a dashed line, zn is the position of the native state (N) peak. 



Figs. 2(e) through (g) illustrate the construction pro- 
cedure for the GRM optical tweezer system. Panel (e) 
corresponds to Step 1, with a transformation of the dis- 
tribution Vtot (whose varying force scale is shown along 
the top axis) into P t ot at constant force F = 11.9 pN. 
Step 2 uses the exact P D h, shown in real-space in panel 
(f), and produces the intrinsic distribution Pgrm, drawn 
as a solid line in (g). The agreement with the exact an- 
alytical result (dashed line) is extremely close, with a 
median error of 3% over the range shown. This devi- 
ation is due to the approximation in Step 1, discussed 
above, as well as the numerical implementation of the 
deconvolution procedure. 

As shown in our previous study [6], the smaller the 
ratio lp/L for the handles, the more the features of the 
protein energy landscape get blurred by the handle fluc- 
tuations. Since the experimentally measured total distri- 
bution always distorts to some extent the intrinsic protein 
free energy profile due to the finite duration and sam- 
pling of the system trajectory, more flexible handles will 



exacerbate the signal-to-noise problem. To illustrate this 
effect, we performed Brownian dynamics simulations of 
the GRM in the optical tweezer setup, with handles mod- 
eled as extensible, semiflexible bead-spring chains (see 
SI for details). In Fig. 3(a) we compare the free energy 
Ftot = — ks Tin Ptot for a fixed L = 100 nm and a varying 
lp/L, derived from the simulation trajectories, and the 

exact intrinsic GRM result J^grm = — ksT IiiVgum at 
Fo. When the handles are very flexible, with l p /L = 0.02, 
the energy barrier between the native and unfolded states 
almost entirely disappears in J^ot, with the noise making 
the precise barrier shape difficult to resolve. Remarkably, 
even with this extreme level of distortion, using our the- 
ory we still recover a reasonable estimate of the intrinsic 
landscape [Fig. 3(b)]. For each F to t in Fig. 3(a), panel 
Fig. 3(b) compares the result of the construction pro- 
cedure and the exact answer for J-"grm- Clearly some 
information is lost as l p /L becomes smaller, since the 
lp/L = 0.02 system does not yield as accurate a result as 
the ones with stiffcr handles. However in all cases the ba- 
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FIG. 3. Effects of handle characteristics on the free energy profile of the GRM in an LOT setup, (a) The total system free 
energy J-tot = — fesTlnPtot for fixed L = 100 nm, and varying ratios l p /L. All the other parameters are in Table SI (GRM 
column). The exact analytical free energy at Fo = 11.9 pN (dashed line) for the GRM alone, J-grm = — fcsThi'PGRM, is shown 
for comparison, (b) For each J-tot in (a), the construction of J-grm at Fo, together with the exact answer (dashed line), (c) For 
system parameters matching the experiment (Table SI), the variance of the point spread function Vbh broken down into the 
individual handle, bead, and linker contributions. The fraction for each component is shown as a function of varying handle 
elastic modulus 7. 



sic features of the exact J-grm are reproduced. Thus, the 
theoretical-based method works remarkably well over a 
wide range of handle parameters. This conclusion is gen- 
erally valid even when other parameters are varied (see 
Fig. S3 in the SI for tests at various Fo and fc tra p). The 
excellent agreement between the constructed and intrin- 
sic free energy profiles for the exactly solvable GRM hair- 
pin over a wide range of handle and trap experimental 
variables establishes the robustness of the theory. 

C. Intrinsic folding landscape of a simulated 
leucine zipper 

To demonstrate that the theory can be used to produce 
equilibrium intrinsic free energy profiles with multiple 
states from mechanical folding trajectories, we performed 
simulations of a protein in an optical tweezer setup. The 
simulations were designed to mirror the single-molecule 
experiment reported in Ref. [10], and to this end we 
studied a coiled-coil, LZ26 [26], based on three repeats 
of the leucine zipper domain from the yeast transcrip- 
tional factor GCN4 [11] (see Methods). The simple lin- 
ear unzipping of the two strands of LZ26 allows us to 
map the end-to-end extension to the protein configura- 
tion. Furthermore, the energy heterogeneity of the native 
bonds that form the "teeth" of the zipper leads to a non- 
trivial folding landscape with at least two intermediate 
states [10, 26, 28]. The more complex landscape of LZ26 
thus provides an additional stringent test of the proposed 
theory. 

The native (N) structure of LZ26 is illustrated on the 
right in Fig. 4 (from a simulation snapshot), with the 
two alpha-helical strands running from N-terminus at the 
bottom to C-terminus at the top. In the experiment 
a handle is attached to the N-terminus of each strand, 
and this is where the strands begin to unzip under ap- 
plied force. To prevent complete strand separation, the 
C-termini are cross-linked through a disulfide bridge be- 



tween two cysteine residues. Each alpha-helix coil con- 
sists of a series of seven-residue heptad repeats, with po- 
sitions labeled a through g. For the leucine zipper the a 
and d positions are the "teeth" , consisting of mostly hy- 
drophobic residues (valine and leucine) which have strong 
non-covalent interactions with their counterparts on the 
other strand. The exceptions to the hydrophobic pat- 
tern are the three hydrophilic asparagine residues in a 
positions on each strand (marked in blue in the struc- 
ture snapshots on the right of Fig. 4). As has been seen 
experimentally [10, 26] (and shown below through simu- 
lations), the weaker interaction of these asparagine pairs 
is crucial in determining the properties of the intermedi- 
ate folding states, a point we will return to in more detail 
in the Discussion. 

In analyzing the LZ26 leucine zipper system, we 
performed coarse-grained simulations using the Self- 
Organized Polymer (SOP) model [17] (full details in the 
SI, with selected parameters summarized in Table SI). 
The intrinsic free energy profile !F P — — fcsTln'Pp at 
Fb = 12.3 pN is shown in Fig. 4(a). The four prominent 
wells in J-" p as a function of z p correspond to four stages 
in the progressive unzipping of LZ26. At F = 12.3 pN 
all the states are populated, and the system fluctuates 
in equilibrium between the wells. The transition bar- 
rier between N and II exhibits a shallow dip that may 
correspond to an additional, very transiently populated 
intermediate. Since this dip is much smaller than fc^F, 
we do not count it as a distinct state. 

Like in the GRM example, adding the optical tweezer 
apparatus to the SOP simulation significantly distorts 
the measured probability distributions. In the first row of 
Fig. 5 sample simulation trajectory fragments are shown 
both for the protein-only case [Fig. 5(a)] at constant force 
Fo = 12.3 pN, and within the full optical tweezer system 
[Fig. 5(c)] with z trap = 503 nm. For the latter case we 
plot both Ztot(i) (purple) and z p (t) (gray), allowing us to 
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FIG. 4. Intrinsic characteristics of the LZ26 leucine zipper at constant Fq, derived from SOP simulations in the absence of 
handles/beads, (a) LZ26 free energy T v at Fo = 12.3 pN vs. end-to-end extension z. Representative protein configurations 
from the four wells (N, II, 12, U) are shown on the right, with asparagine residues colored blue, (b) The average fraction of 
native contacts between the two alpha- helical strands of LZ26 (the "zipper bonds") as a function of z. Listed to the left of the 
curve are the a and d residues in the heptads making up the amino acid sequence for each LZ26 strand, placed according to 
their position along the zipper. Asparagines (N) are highlighted in blue, (c) For the residues listed in (b), the residue contact 
energies used in the SOP simulation (rescaled BT [15] values). 



see how the bead separation tracks changes in the protein 
extension. The probability distributions V p and Vtot are 
plotted in Fig. 5(b) and (d) respectively. In Fig. 5(e), 
the distribution Vtot within the optical tweezer system 
is plotted for z trap = 503 nm. Though we only illus- 
trate this particular z trap value, ~ 260 trajectories are 
generated at different z trap and combined together using 
WHAM [7] (see SI) to produce a single Vtot at a con- 
stant force F = 12.3 pN [Fig. 5(e)]. We can then use 
our theoretical method to recover the protein free energy 
P v [Fig. 5(f)]. Despite numerical errors due to limited 
statistical sampling (both in the protein-only and total 
system runs), there is remarkable agreement between the 
constructed result and JF p derived from protein-only sim- 
ulations. This is particularly striking given that the total 
system free energy Jtot(^tot) = -k B T lnVtot{ztot), plot- 
ted for comparison in panels (f), shows how severely the 
handles/beads blur the energy landscape, reducing the 
energy barriers to a degree that the N state is difficult 
to resolve. The signature of N in Jtot^ot) i s a slight 
change in the curvature at higher energies on the left of 
the II well. However despite this, we still recover a basin 
of attraction representing the N state in the constructed 
J~ v . Overall, the results in (f) show that our theory can 
accurately produce the intrinsic free energy profiles us- 
ing only the simulated folding trajectories as input, thus 
proving a self-consistency check of the method for a sys- 
tem with multiple intermediates. 



D. Folding landscape of the leucine zipper from 
experimental trajectories 

As a final test of the efficacy of the theory we used the 
experimental time series data [10] to obtain T p . The data 
consists of two independent runs with the LZ26 leucine 
zipper, using the same handle/bead parameters for each 
run (see Table SI), but at different trap separations ztrap- 
We project the deconvolved landscape from each trajec- 
tory onto the mid-point force Fo where the two most 
populated states (II and U) have equal probabilities in 
V p . The values of F a derived from the two runs are the 
same within error bounds: 12.3 ± 0.9 and 12.1 ± 0.9 pN. 
The detailed deconvolution steps are shown for one run 
in the last row of Fig. 5, and the final result, the intrinsic 
free energy profile J-" p , is shown for both runs in Fig. 5(h) 
(solid and dotted blue curves respectively). Accounting 
for error due to finite trajectory length and uncertain- 
ties in the apparatus parameters, the median total un- 
certainty in each of the reconstructed landscapes is about 
0.4 ksT in the z range shown (see SI for full error analy- 
sis) . The landscapes from the two independent runs have 
a median difference of 0.3 fc^T, and hence the method 
gives consistent results between runs, up to a small ex- 
perimental uncertainty, an important test of its practical 
utility. The reproducibility of T v is a testament to the 
stability of the dual optical tweezer setup, allowing us 
to sample extensively from the energy landscape: each 
trajectory lasted for more than 100 s, and thus collected 
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FIG. 5. (a,b) A trajectory fragment and the probability distribution V p from SOP simulations of the LZ26 leucine zipper at 
constant force Fo = 12.3 pN in the absence of handles/beads. (c,d) A trajectory fragment and the total system distribution 
Vtot at ztrap = 503 nm. Panel (c) shows both the total extension z to t{t) (purple) and the protein extension z p (t) (gray). 
Triangles mark times when the protein makes a transition between states, and the arrows point to two enlarged portions of 
the trajectories. In all cases the 2-axis origin is z\\, the peak location of the II intermediate state, (e-g) Leucine zipper free 
energy profiles extracted from time series (third row = simulation, fourth row = experiment). The first column shows the 
total system end-to-end distribution Vtot, and the corresponding Vtot at constant force Fo = 12.3 pN. In the experimental case 
Fq = 12.3 ±0.9 pN is the mid-point force at which the II and U states are equally likely. For Vtot, ztrap = 503 nm (simulation), 
1553 ± 1 nm (experiment). Force scales at the top are the range of trap forces for Vtot- The second column shows the computed 
intrinsic protein free energy profiles J- p , compared to the total system profile, Ftot (shifted upwards for clarity), (f) SOP 
simulations for the protein alone at constant Fo provide a reference landscape, drawn as a dashed line, (h) The dashed curve 
is the reconstructed T p at the mid-point force Fo = 12.1 ± 0.9 pN, from a second, independent experimental trajectory, with 
ztrap = 1547 ± 1 nm. The J-" p curves have a median uncertainty of 0.4 ksT over the plotted range (see SI for error analysis). 
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<~ O(10 2 — 10 5 ) of the various types of transitions between 
protein states (the slowest transition, U — > 12, occurred 
on time scales of 0.4 — 0.6 s). 

Comparison between the experimental P p in panel (h) 
and the simulation result in (f) reveals a notable differ- 
ence: the landscape constructed using the experimen- 
tal data does not have four identifiable basins. The N 
state may not be discernible in the experiment because 
of the limited resolution of the apparatus (see below). 
The spacing between the II and 12 wells is similar in the 
simulation and experiment (w 9 — 13 nm), but that be- 
tween 12 and U is ~ 13 nm in the simulation versus 25 
nm in the experiment. This is likely due to a larger helix 
content in the unfolded state for the simulation case. 

II. DISCUSSION 

A. Origins of the variance in the point spread 
function 

Our theory for the point spread function Pbh can be 
used to understand the interplay of physical effects that 
relate the intrinsic protein distribution to the total sys- 
tem. To quantify the various contributions to Pbh, we 
calculated its variance. Since variances of probability 
distributions combine additively upon convolution, wc 
break down the variance of Pbh into the individual bead, 
handle, and linker contributions. Fig. 3(c) shows the 
fraction of the variance associated with each component 
as a function of the handle elastic stretching modulus 
7 at F = 12.3 pN, with R b = 500 nm, L = 188 nm, 
lp = 20 nm (the approximate experimental parameters 
from Ref. [10]). For any given value of 7, the height of 
each of the four colored slices represents four fractions. 
Though not directly measured in Ref. [10], we have as- 
sumed k — 200 kcal/mol-nm 2 , I — 1.5 nm for the linkers. 
The handle contribution is itself broken down into the 
"clastic" part, defined as the extra variance due to the 
finite stretching modulus 7, compared to an inextensiblc 
(7 — > 00) worm-like chain (WLC), and the remainder, 
which we call the WLC part. For the case of Ref. [10], 
7 = 400 pN. Since the length extension relative to the 
WLC result is w P0/7, we expect finite handle extensibil- 
ity to play a small role. However, the elastic contribution 
to the total Pbh variance at this 7 is 43%, comparable to 
the WLC contribution of 48%. Hence, in predicting Pbh 
correctly it is important to account for both the bending 
rigidity and elasticity of the handles, which are exactly 
modeled in our approach. 

B. Nature and location of the intermediate states 
in the leucine zipper energy landscape 

The folding landscape of LZ26 is apparently closely 
related to the pattern of residue-residue contact ener- 
gies between the two strands of the zipper [10, 26, 28]. 
SOP simulations give us a detailed picture of this re- 



lationship. The average fraction of intact inter-strand 
("zipper") bonds vs. extension z, in Fig. 4(b) is a mono- 
tonic curve, starting with the fully closed structure on 
top (N state, bond fraction near 1) to the fully open 
structure at the bottom (U state, bond fraction near 0). 
Listed along this curve are the individual residues at the 
a and d positions of the heptads in the sequence. Sev- 
eral features stand out: the transition barriers between 
the states show a steeper rate of zipper bond unravel- 
ing compared to the well regions. The change of slope 
from steep to more gradual descent occurs near the lo- 
cation of the asparaginc residues in the sequence, and 
the the well minima of II, 12, and U occur one or two 
residues after the asparagines. The correlation between 
well minima locations and asparagines agrees with the ex- 
perimental landscape [10], underscoring the importance 
of the weak, hydrophilic asparagine bonds that interrupt 
the hydrophobic valine/leucine pattern at the a/d posi- 
tions. The sequence of rescaled BT [15] energies used 
for the a/d native contacts is plotted in Fig. 4(c). The 
a/d bonds are all > 2.8 fcsP, except for the asparagines, 
which are less stable at 1.7 fc^T. 

C. Instrumental noise filtering, and the limits of 
the theoretical approach 

The difference in the number of wells in the simulation 
and experimental free energy landscapes of the leucine 
zipper is related to finite time and spatial resolution. The 
measured time series is subject to noise (environmental 
vibrations of the optical elements, detector shot noise), 
as well as low-pass filtering due to "parasitic" effects in 
the photodiodes and the nature of the electronic amplifi- 
cation circuits [18]. The standard experimental protocol 
often involves additional low-pass filtering as a way of re- 
moving noise and smoothing trajectories: for the leucine 
zipper every five data points (originally recorded at 10 /its 
intervals) are averaged together during collection to give 
a time step of 50 //s [10]; in other cases similar effects 
are achieved using Bcsscl filters [22]. Noise broadens the 
measured distribution of bead separations, while low-pass 
filtering narrows it. We developed the FBS technique 
(Methods and SI), based on the details of the specific ap- 
paratus used in the experiment, to estimate and correct 
for the distortions. For our system, the FBS theory pro- 
vides an excellent description, as we have verified in tests 
using both numerical simulations and experimental data 
(with and without the additional filtering). 

However the FBS theory can only apply corrections to 
peaks (i.e. distinct protein states) that we observe in the 
measured probability distributions. There is the possi- 
bility of protein states leaving no discernible signature in 
the recorded distribution. The N state in the leucine zip- 
per is only connected to the II state in the folding path- 
way. In the simulations, where the N state is directly ob- 
served, it has short mean lifetimes (< 6 ^s in the studied 
force range), and the Noll change involves the shortest 
mean extension difference (~ 8 nm) among all the tran- 
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sitions. If the N state in the actual protein has similar 
properties, it could be impossible to resolve it in the ex- 
perimental data for two different reasons: (i) Regardless 
of any additional filtering, the intrinsic low-pass charac- 
teristics of the apparatus filter out states with very short 
lifetimes. For our LOT setup, the effective low-pass filter 
time-scale for the detectors/electronics is Tf w 7 /us (SI), 
which is at the cutting edge of current technology. Thus, 
states with lifetimes < Tf will not appear as distinct 
peaks in the measured distribution, (ii) Independent of 
the filtering issues in detection/recording, environmental 
background noise in the time series also poses a prob- 
lem, particularly since we measure bead displacements, 
and these have signal amplitudes at high frequencies that 
are generally attenuated compared to the intrinsic ampli- 
tudes of the protein conformational changes. The reason 
for this is that the beads have much larger hydrodynamic 
drag than dsDNA handles or proteins, and their char- 
acteristic relaxation times t t in the optical traps may 
be comparable to or larger than the lifetime of a par- 
ticular protein state. The bead cannot fully respond to 
force changes on time scales shorter than its relaxation 
time [14]. For example, r r w 20 ^s in the leucine zipper 
experiment. If the lifetime of the N state at a particular 
force is much smaller than i>, protein transitions from 
II— >-N— >-Il will generally occur before the bead can relax 
into the N state equilibrium position. If the bead dis- 
placements associated with these transitions are smaller 
than the noise amplitude in the time series, the entire 
excursion to the N state will be lost to the noise. 



We can illustrate the finite response time of the bead 
using simulations where resolution is not limited by noise 
or apparatus filtering, allowing us to illustrate the rela- 
tionship between z tot (t) and z p (t), compared in two dif- 
ferent trajectory fragments in Fig. 5(c). Triangles in the 
figure indicate times where the protein makes a transi- 
tion between states. Changes in protein extension during 
these transitions are very rapid, and the bead generally 
mirrors these changes with a small time lag, as seen in 
the enlarged trajectory interval at t — 36 — 42 /is. When 
the protein makes sharp, extremely brief excursions (like 
a visit to the N state from II in the enlarged interval 
t = 90 — 96 lis), the corresponding changes in bead sep- 
aration are smaller and much less well-defined. In the 
presence of noise, such tiny changes would be obscured. 



Thus, we surmise that the N state is not observable due 
to some combination of apparatus filtering, noise, and fi- 
nite bead response time. Hence, the theory applied to the 
experimental data produces a landscape with only II, 12, 
and U wells, as opposed to the four wells produced from 
the simulation data. Our labeling of the basins in the 
landscape agrees with the earlier state identification [10], 
and provides an explanation for why the N state was not 
resolved. 



III. CONCLUSIONS 

Extraction of the energy landscape of biomolecules us- 
ing LOT data is complicated because accurate analy- 
sis depends on correcting for distortions due to system 
components on the measured result. We have solved 
this problem completely by developing a theoretically- 
based construction method that accounts for these fac- 
tors. Through an array of tests involving an analytically 
solvable hairpin model, coarse-grained protein simula- 
tions, and experimental data, we have demonstrated the 
robustness of the technique in a range of realistic scenar- 
ios. The method works for arbitrarily complicated land- 
scapes, as demonstrated by the analysis of the leucine 
zipper experimental data, producing consistent results 
when the same protein is studied under different force 
scales. 

IV. MATERIALS AND METHODS 

A. Finite Bandwidth Scaling (FBS) 

Probability distributions derived from experimental 
time series of bead-bead separations are corrupted by 
noise, low-pass filtering due to the apparatus, and in 
some cases additional filtering due to the data process- 
ing protocol. We developed FBS theory to model and 
correct for these effects (see SI for details), using infor- 
mation encoded in time series autocorrelations, together 
with earlier spectral characterization of the dual trap op- 
tical tweezer detector and electronic systems [18]. All the 
experimental distributions Vtot in the main text were first 
processed by FBS. 

B. Leucine zipper 

We use a variant of the coarse-grained self-organized 
polymer (SOP) model [17, 31], where each of the 176 
residues in LZ26 is represented by a bead centered at 
the C a position (see SI for details.) The a- helical 
secondary structure is stabilized by interactions which 
mimic (i, i + 4) hydrogen bonding [13]. We use residue- 
dependent energies for tertiary interactions [15]. 

C. Simulations 

We simulate (see SI for details) trajectories for both 
the protein alone and the full optical tweezer setup using 
an overdamped Brownian dynamics (BD) algorithm [33]. 
The handles used in the LOT setup [Fig. 1] are modeled 
as scmiflcxiblc chains. 
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TABLE SI. Parameters used in the GRM model and SOP simulation of the LZ26 leucine zipper, together with the corresponding 
quantities from two experimental runs [10] 



Parameter 


GKM 


bU.r bimulation 


Experiment 


Bead: Kb (nm) 


con 

500 


1 AA 

100 


coo 1 or 

500±25 


Irap: z trap (nm) 


1 OA A 


483 - 543 


1553±1, 1547±1 


trap. Ktrap ^plN/nmj 


n 9^ 


U.ZO 


n 9^-1-n no n 97-i-n n^b 


Trap: a 


1/3 


1/3 


unknown 


Handle: L (nm) 


100 


100 


188±2 


Handle: l p (nm) 


20 


20 


20±2 


Handle: 7 (pN) 


2780 


2780 


400±40 


Linker: k (kcal/mol-nm 2 ) 


200 


200 


200 c 


Linker: £ (nm) 


1.5 


1.5 


1.5 



a different separations correspond to two folding trajectories 
b left, right trap strengths 

c linker characteristics are assumed for the experimental case 



I. THEORY FOR FREE ENERGY CONSTRUCTION FROM MECHANICAL FOLDING TIME SERIES 



A. Optical trap Hamiltonian 



We begin with the Hamiltonian for the beads in the traps (Fig. I in the main text), which allows us to introduce 
the relevant variables of the system. If the displacements of the beads from the trap centers are small (< 100 nm for 
a laser of 1064 nm wavelength and bead radii ~ 0(100 nm) [1]), the trap Hamiltonian can be approximately written 
as: 

%rap(ri,r 2 ) = ^k x {x\ + x\) + ^k y {y\ + yf) + ^k z \z\ + (z trap - z 2 f] , (SI) 

where = (xi,yi, m) is the position of the ith bead center, k x , k y , k z are the trap strengths along each coordinate 
direction, and the two traps are positioned at z = and z = z trap respectively. Given the cylindrical symmetry of 
the optical traps around the y axis, we take k x = k z = k trllp and k y = ak tlap , where the weaker axial trapping is 
reduced by a factor a < 1 [2] . We assume both traps have equal strengths, though our method can be generalized to an 
asymmetric configuration, where the two traps have different strengths fct rap ,i 7^ fctrap.2- In this case the reconstruction 
procedure derived below is valid with the substitution fc t ra P = 2fctrap,i^trap,2/(fctra P ,i + fctrap,2)- 

We rewrite the Hamiltonian in Eq. (SI) by defining a total end-to-end coordinate r to t = r 2 — ri = (x to t,ytot, ztot), 
and a total center-of-mass coordinate R to t = r 2 + i"i = (X to t, Ytot, Z to t)- In terms of these variables, 'Htrap becomes: 

^trap(ri,r 2 ) = "Ht™ p (Rtot) + %trap( r tot), 
^traplR-tot) = ^k x X^ ot + -k y Y^ ot + -fc z (z t rap - Z tot ) 2 , (g 2 ) 

^trap( r tot) = -^k x xl ot + -k y y^ ot + -fcz(zt ra p — Ztot) 2 - 

The variables Ztot and Ztrap are explicitly labeled in Fig. 1 of the main text. 



13 



B. Equilibrium distribution of the system 

The equilibrium probability "Ptot(Rtot, r to t) of finding the beads at positions with a given R to t and r to t can be 
expressed as: 

7>tot(Rtot,r t ot) = Ae-^-p( Rt -)-^- P ( r -')Q tot (r tot ), (S3) 

where f3 = l/fc^T, A is a normalization constant, and Qtot(rtot) is the equilibrium probability of the total bead-handlc- 
protein system having bead separation r to t in the absence of the external trapping potential or any applied force. By 
translational symmetry Q to t is independent of the center-of-mass coordinates, and by rotational symmetry Qtot(rtot) = 
Qtot(|r to t|)- Thus, if we introduce cylindrical coordinates r tot = (p tot , (j) tot , z tot ), where p tot = ' x\ ot + y t 2 ot , 4> tot = 
tan~ 1 (y t ot/^tot), there is no angular dependence, so that Qtot(rtot) = Qtot(ptot, ^tot)- We are ultimately interested 
in the marginal probability 'Ptot(ztot), which can be derived from the experimental time series and forms the starting 
point of our theoretical procedure to obtain the desired free energy profile. We obtain "Ptot(ztot) from V tot (Rtot, rtot) 
by integrating over the R to t , Ptot and </> to t degrees of freedom: 

"Ptot ( ztot ) = J Ptotdptot #tot<£Rtot Vtot (Rtot, r to t) 

= B [ Ptotdptot#tote-^^p ( ' 5tot ^ t ° t ' Ztot) Qtot(ptot^tot) 

J (S4) 

= 2nB J p tot dp tote -^( fe »+^)^-^^(^--^t) 2 7 Q(fc x _ k v )p 2 to ^j Q to t(/Wtot) 
s , Ce -|/3fc,( Ztrap - Ztot ) 2 g tot(ztot) . 

Here B and C are constants that have absorbed the result of integrating over R tot and p to t respectively, and Jo 
is a modified Bessel function of the first kind. Up to the third line the calculation in Eq. (S4) is exact. In the 
last step we make the problem fully one-dimensional, by approximately relating £tot(ztot) to Qtot(ztot), defined 
as Qtot(^tot) = j Ptotdptot Qtot(Aot, ^tot)- We are forced to make this crucial approximation, because experiments 
have access only to the z fluctuations through "Ptot(ztot), but generally do not have complete information about the 
transverse components. As mentioned in the main text, the last step in Eq. (S4) becomes exact if: (i) k x = k y = 0; 
or (ii) when Q to t(Ptot, ^tot) is separable in the form Q to t(Ptot, ^tot) = /(Ptot)Qtot(^ot) for some function /. Though 
condition (ii) is not expected to be generally valid, we can approximately satisfy (i) when (3kt ra pPtot ^ 1> where p t ot 
is the typical length scale of total system fluctuations transverse to z. Thus, for sufficiently soft traps, we have in 
Eq. (S4) a useful relation between the z marginal probabilities of the total system with and without the external 
trapping potentials. 



C. Convolution 



Since Qtot(ztot) is the total end-to-end z-component distribution in the absence of any external trapping potential 
or applied force, the corresponding distribution for the total system with constant tension Fq applied to the beads 
along z is given by fi t ot(z t ou F ) = exp(f3F z tot )Q tot (ztot)- Substituting for Qtot(^ot) using Eq. (S4), we find the 
following relation for Aot(ztot! Fo), which constitutes Step 1 of our construction procedure in the main text: 

Aot^tot^o) « C-'e^+i^^-^Vtotiztot). (S5) 

The quantity T-'tot(ztot) on the right-hand side can be derived from the experimental time series, and thus Eq. (S5) 
allows us to obtain an equilibrium distribution in the constant force ensemble, Vtot{ztou Fo), directly from the folding 
trajectories. 
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In the constant force ensemble, P to t is just a ID convolution of the probabilities of the individual system components: 

Ptot =P b *P h *P p * P h * P h , (S6) 

where * denotes the convolution operator. The probability P\(z\; Fq) is the equilibrium distribution of z\ at constant 
force F , where A denotes a bead, handle, or protein. The quantity z\ is the end-to-end distance of A along z. Using 
the notation in Fig. 1 of the main text, we can give a few examples: for the protein z p = (r p2 — r p i) • z; for the left 
handle Zh = (r pl — r^) • z; for the left bead z D = (r' x — ri) • z. In Fourier space Eq. (S6), which is the key equation for 
Step 2 of the construction procedure in the main text, has a simple form: 

Ptot(k;F ) = P h l (k;F )P*(k;F )P p (k;F ) = P hh (k; F )P p (k; F ), (S7) 

where P\(k; F ) is the Fourier transform of P\{z\\Fq). Here P D h(k; F ) = P£(k; F n )P^(k; F ) is the Fourier transform 
of the convolution of all the bead and handle distributions. If the left and right handles (or analogously the beads) 
had distinct properties (i.e. different sizes) then the factor P^(k;Fo) in P D h(k;F ) would be replaced by the product 
Phi(k; Fo)Ph2(k; Fq) of the distinct handle terms. Given the rotational properties of the beads and modeling the 
handles as semiflexible polymers, we can derive a numerically exact form for the Fourier components P a h(k; F ), and 
hence by inversion the corresponding real space distribution. This will allow us to directly recover P p from Ptot, 
without resorting to an experimental estimate for the point spread function, which is problematic due to the varying 
force conditions that arise in optical traps with non-zero stiffness. 



D. Bead distribution 

The first step in finding P o h{k;F ) = P£(k; F )P^(k; F ) is to obtain an expression for the Fourier-space bead 
probability P D (k; F ). Taking as an example the left bead in Fig. 1, let rb = — ri be the vector between the bead 
center and the point on the bead surface that is attached to the handle. This vector has a fixed length R b given by 
the bead radius, but its direction can fluctuate, subject to a constant force F along z. The equilibrium distribution 
P h (r h ;F ) is given by: 

P b (r b ; F ) = A h e^S (|r b | - R b ) , (S8) 

with the delta function enforcing the constraint |i"b| = Rb, and the normalization constant A b . The quantity P b {k; F ) 
is the Fourier transform of P D (r o ;F ) evaluated at k = kz: 

P h (k;F ) = J dr h e- ikzb P h (v b ;F ) 

f3Fosmh((f3F -ik)R b ) ^ 
~ (PF -ik)smh(l3F R b )- 



E. Handle distribution 



Though the Fourier components Ph(k;Fo) of the semiflexible handle distribution do not have a simple analytic 
expression, they can be calculated numerically to arbitrary accuracy. The Hamiltonian for the semiflexible handle 
polymer with contour length L, persistence length l p , and elastic stretching modulus 7, can be exactly mapped onto 
the propagator of a quantum particle on the surface of a unit sphere [3, 4]. Following the approach in Ref. [4], we 
describe the polymer as a continuous spatial contour r(s) in terms of an unstrctched arc length s which runs from 
s = to s = L. At each point s we define a unit tangent vector u(s). The end-to-end distance r(L) — r(0) can be 
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written as, 



r(L) - r(0) 



ds(l + e(s))u(s), 



(S10) 



where 1 + e(s) is the local relative bond length extension. For an inextensible (7 — > 00) worm-like chain, e(s) = for 
all s, which corresponds to all bonds in the chain having fixed length. For finite 7, the e(s) are additional degrees 
of freedom in the system, which together with the unit tangent vectors u(s) completely define the contour. The 
Hamiltonian H(u(s), e(s)) for the semiflexible polymer under tension is, 



/3H(u( S ), e(s)) = l j£d S (d s u( S ) f - fz ■ (r(L) - r(0)) + ^ £ ds e 2 (s), 



= [ ds 
Jo 



l i(d s u(s)) 2 - /(I + e(s))u z (s) + ^e 2 (s) 



(Sll) 



where ji = l/ksT and we have used Eq. (S10) for the end-to-end distance. The first term in Eq. (Sll) corresponds 
to a bending energy parameterized by the persistence length l p , the second term is due to an applied mechanical 
force ksTf along z, and the third term describes the stretching energy of the bonds, with elastic modulus 7. For 
prestretching tension Fq, f = /3Fq, but for convenience we will extend the definition of H to include arbitrary / in 
order to obtain the Fourier components of the end-to-end probability distribution below. 



The partition function of the polymer (with free end boundary conditions) can be expressed as a path integral over 
all possible configurations of u(s) and e(s), with the constraint that u 2 (s) = 1 at each s: 



Z h (f) = Jvu(s)l[5(u 2 (s)-1) Jve(s) exp [-0«(u(«), e(s))] , 
= J Vu(s)l[S(u 2 (s) - 1) expheWeffM*))] 



(S12) 



up to some normalization constant. In the second line we have carried out the path integral over e(s) exactly to 
express Zh(/) in terms of an effective Hamiltonian - H c fr(u(s)) depending on the tangent vectors alone, 



u(*)) = f 
Jo 



ds 



l P(d s u(s)) 2 -fu z (s)-^u 2 z (s) 



2 y "° "~" J ~^~' 2/97 

The probability of finding the polymer in a configuration with an end-to-end extension Zh along z is given by [3] : 
A(z h ;F ) = / Vu(s)l[S(u 2 (s) 1) 



(S13) 



Ve{s) S [ z hand I ds(l + e(s))u z {s) I exp [-pH(u(a), e(a))] 



Vu(s)l[S(u 2 (s)-l) 



Z h ((3F ) 

Jve(s) J ^e lfc (^-/o L ^(i+^W)^W)exp[-/3H(u( S ),e( S ))] 

= J ^e ikz -V h (k;F ), 

where the Fourier components of the probability distribution are: 

Z h (l3F - ik) 



(S14) 



Vh(k;F Q ) = 



Z b ((3F ) 



(S15) 
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In order to evaluate Vh(k;F ), we need to calculate Zh(f). Let us define the propagator G(u ,ul; L) as the path 
integral over all configurations with initial tangent u(0) = u and final tangent u(L) = Uz,: 



™(I)=ul 

G(u ,u L ;L) = Vu(s)l[S(u 2 ( S )-l)e-^ u ^ . (S16) 

•/u(0)=u o 



This is related to the partition function through Z^{f) = (4ir) 2 J s du du^ G(u , u^; L), where the integrations are 
over the unit sphere S. 

The quantum Hamiltonian corresponding to l3H e s is 

^eff(/) = -^V 2 -/cosfl-^cos 2 0, (S17) 



describing a particle on the surface of a unit sphere, with 9 = defining the z direction. The propagator G can be 
written in terms of the quantum eigenvalues E n and eigenstates f/'ra( u ) of "H^: 

G(u ,u L ;L) = Y / e- E " L r n (uo)MuL)= £ e- E - L a* nVm ,a nlm Y{ lml (u )Y lm (u L ) , (S18) 

n n,l,m,l',m' 

where we have expanded the eigenstates in the basis of spherical harmonics, Vv»(u) = X^m a nimXim{^) ■ The coefficients 
a n im are the components of the nth eigenvector of the Hamiltonian T-C^ in the Y/ m basis. The partition function 
Zh(f) becomes: 

Zhtf) = tj^ / du du L e_£; " L a™(' m ' a ™;m^* m '(uo)yi m (u L ) 

\ n > Js n,l,m,l',m' 

1 V- -E n L * (S19) 
~~ 4^ 2^ 6 a n00 a "00 
n 

= (0|e-™e f u f (/)| ). 

In the last step we have written the expression as a single component of the exponentiated matrix H^g(f) in the 
(l,m) spherical harmonic basis, where \l) denotes a state (1,0). Since the Hamiltonian matrix in the to — subspace 
does not couple to to 7^ components, we only need to — matrix elements to evaluate Z^(f). The list of non-zero 
matrix entries in the to = subspace is: 



+ !) = (' + 1IWSWI0 = ijl^l + 3) ■ < = - 1 - 2 (S20) 

msm + 2 > - <« + to» - (a+ i;;,^K2 i+ 5)- < - * * • ■ ■ ■ 

To carry out the matrix exponent, we truncate the matrix at i max = 20, which is sufficiently large for numerical 
accuracy. 

In some experimental setups, covalent linkers are attached on both ends of each handle, connecting the handle to 
the neighboring bead and protein, as schematically drawn in Fig. Sl(a). The effect of linkers can be absorbed into 
the theory by modifying Ph(k; Fq). The simplest representation of a linker is a harmonic spring with stiffness k and 
natural length I. With one of these added at each end of the handle, Eq. (S15) becomes: 

Vh{k > Fa) = ^WFo) pwT' (S21) 
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a) 



handles 




FIG. SI. a) Schematic illustration (not to scale) of the covalent linkers which attach the handle to the bead on one end, and 
the handle to the protein on the other end. b) The Generalized Rouse Model (GRM) within the optical tweezer. 




where 



2iink(/; K,£) = 



1 



/V 2(0*) 



7T fv-2epK) 

e a £» 



+ e 2fe {f + £(3k) ( erf 



f + £(3K 



ferfc [ — ; ^ - 



£f3n 



(S22) 



F. Numerical deconvolution to extract the protein distribution 



The expressions given by Eq. (S9) and (S21) completely determine the Fourier-transformed point spread function 
Vbh(k;F ) at all k. Naively, one could use Eq. (S7) to write: 



V P (k;F Q ) = 



Vtot{k;F ) 



(S23) 



Since Vtot(k; Fq) is derivable from the experimental times series, this would immediately yield V p (k;Fo), and after 
inversion the ultimate goal, V p (z p ; F ). However, this direct deconvolution in Fourier space is numerically unstable [5], 
due to the effects of round-off noise and the denominator in the equation for V p (k;F ) approaching zero at large fc. 

To work around this problem, we implement the deconvolution in real space, by solving the following integral 
equation for V p (the real space version of Eq. (S7)): 



dz p P b h(^tot - z p ; F Q )V p (z p ; F ) = V to t(z tot ; F ). 



(S24) 



One way to approach Eq. (S24) is to approximate the integral as a matrix-vector product by discretizing the z to t and 
z p ranges. However, the convolution matrix corresponding to Vbh is generally ill-conditioned, so direct inversion to 
find a solution is unfeasible. Alternatively, to obtain robust, smooth results for the deconvolution, we can rewrite 
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Eq. (S24) by representing the three quantities V p , Vbh, and Vtot in terms of suitable fitting functions. Since these 
are all probability distributions, in practice we can approximate them to arbitrary precision as sums of Gaussians 
g(z; C, v) = (2TTV)- 1 / 2 exp(-(z - () 2 /(2v)), 



V a (z a ;F ) 



E 

i=l 



(S25) 



where a = p, bh, or tot. The number of Gaussians needed for each distribution, N a , is chosen depending on the 
problem. The two sets of parameters {a* ot , £* ot , w' ot } and {a|? , £ b , vf h } (which implicitly depend on F ) are computed 
by fitting to the known functions Vtot an d A>h- The goal of the procedure is then to use Eq. (S24) to solve for the 
parameter set {af, £?, vf} describing the unknown function V p . For the cases discussed in the main text, choosing 
N a = 2 — 3 was sufficient to find solutions V p such that the left and right-hand sides of Eq. (S24) had a median 
deviation < 1% over all z tot where ^tot^tot) ^ 10~ 6 . 

The details of the solution procedure are as follows: we choose N p — N tot , so that Eq. (S24) can be approximated as 
a one-to-one convolution mapping each Gaussian in V p into a corresponding Gaussian in Vtot- For alH = 1, . . . , N to t, 
Eq. (S24) describes the following relationships between the amplitudes, positions and variances of the Gaussians: 



,tot 



l i > 



N bh 

e t «E a f(^ bh +^ p ), 

3=1 

^ tot «E a fK bh +^ p + (^ h +a 2 )- 

3=1 



(S26) 



The approximation is exact when the point-spread function V^h is precisely a single Gaussian, but is generally valid 
whenever Vbh is close to Gaussian (as is the case for the bead-handle system, where the corrections introduced by 
choosing iVbh > 1 are small). Eq. (S26) can be inverted to yield the desired parameter set {a? ,£? ,vf}: 



Afbh 

er^r-E"^. 

3=1 

<~«r-E a i h 

3=1 



bh 

3 ' 



(S27) 



.bh 




where we have used the fact that X^ 11 flbh = ^ °^ uc to normalization. 



II. EXPERIMENTAL VERIFICATION OF THE MODEL FOR THE POINT-SPREAD FUNCTION 

In order to check that the theoretical model of the point-spread function 'Pbh derived in Sees. ID-IE is an accurate 
description of the handle and bead response in experiments, we analyzed control experiments on a system with only 
dsDNA handles and beads. Bead radii are — 500 ±25 nm, the trap strength is fc trap = 0.29 ±0.02 pN/nm, and the 
handle parameters are extracted from the theoretical best-fit described below. Four distinct experimental data sets 
are collected (Fig. S2): the first is from a pulling setup, where the trap separation is varied to give a trajectory of force 
F vs. total extension z (Fig. S2(a), blue curve); the other data sets are trajectories of extension z as a function of time 
collected at three different constant trap separations. These three trajectories can be binned, and projected onto the 
constant force ensemble using the same method (Eq. (S5)) as described above for the full system, yielding probability 
distributions Vbh(z; F ) for the total end-to-end extension of the bead-handle system (Fig. S2(b-d), blue curves). The 
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z [nm] z — z [nm] z — z [nm] z — z [nm] 



FIG. S2. Experimental (blue solid curves) and theory (red solid curves) results for a system containing only dsDNA handles and 
beads. The apparatus parameters and theoretical best-fit values are described in Sec. II. The same set of best-fit parameters 
is used for all the theory curves, (a) Force F vs. total extension z from an experimental pulling trajectory, compared to the 
theoretical mean extension as a function of force; (b-d) Total bead-handle probability end-to-end distance distributions Vhh 
(blue solid curves) derived from three experimental runs at different constant trap separations, corresponding to mean forces 
of 9.4 ±0.7, 11.5 ±0.8, and 12.7 ±0.9 pN respectively. In each case the experimental data is corrected for noise/filtering effects 
using the FBS method (Sec. VI), and transformed into the constant force ensemble using Eq. [1] of the main text, with Fo 
chosen to be equal to the mean force value in the trajectory. The distance scale is centered at z, the mean extension. The 
light blue shaded region around each blue curve represents the standard error margin for every point in the distribution (68% 
confidence band). For comparison, the experimental results omitting the FBS corrections are shown as gray dashed lines. 

constant force value Fq for each projection is chosen equal to the mean force in each of the three trajectories, namely 
F = 9.4 ± 0.7, 11.5 ± 0.8, and 12.7 ± 0.9 pN. 

The experimental data were collected at 100 kHz, with no additional time averaging beyond the electronic filtering 
intrinsic to the detection and recording apparatus. Prior to the projection onto the constant force ensemble, the 
FBS method (Sec. VI) was used to approximately correct the raw experimental data for distortions due to electronic 
filtering and noise. In the absence of these corrections, the Vbh(z; Fq) from the raw data is given by the dashed curves 
in Fig. S2(b-d). 

The standard error margin (68% confidence interval) for each point in the Vhh distribution is marked by a light blue 
band, reflecting uncertainties in apparatus and FBS parameters, as well as statistical error due to sampling. Details 
of the error estimation procedure are in Sec. VII. The median standard error in the z range shown varies from 3 — 5% 
between the three trajectories. 

We use the theoretical model of Sees. ID-IE to simultaneously fit all four experimental data sets with a single set 
of handle parameters, yielding best fit values: L = 173 ± 2 nm, l p = 11 ± 1 nm, and 7 = 520 ± 70 pN. The theory 
has excellent agreement with all the experimental results, with median deviations in Vhh(z; Fq) for the z range shown 
in Fig. S2(b-d) varying from 1 — 3% between the three trajectories, comparable to the standard error margins. The 
comparison between theory and experiments firmly establishes the remarkable accuracy of our theory in quantitatively 
describing the bead-handle system. 

III. GENERALIZED ROUSE MODEL (GRM) 
A. Hamiltonian and exact probability distribution for the GRM 

The GRM model [6], illustrated schematically in Fig. Sl(b), is a Gaussian chain with TV monomers, connected 
by N — 1 harmonic springs with an average extension a. A conformation of the GRM is specified by the monomer 
positions l*,, i = 1, . . . , N. To get behavior reminiscent of hairpin unzipping, an additional harmonic bond potential, 
F(|rjv — ri|), is added between the end-points ri and rjv; the force due to this potential is non-zero only if the 
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FIG. S3. The GRM free energy Fgrm after deconvolution for three different values of the force Fo: (a) 9.9 pN; (b) 11.9 pN; (c) 
14 pN. In each case, results for two different trap strengths fc tra p = 0.25, 2.5 pN/nm are shown as solid lines of different color. 
The exact analytical solutions are drawn as dashed lines. The z scale is plotted relative to z m i n , the location of the minimum 
in the free energy. 



end-point separation is within a cutoff distance, c. Under a constant external tension, FqZ, the GRM Hamiltonian is 

O. rp N-i 
i— 1 

where V(r) — kr 2 Q(c — r) + kc 2 0(r — c), and is the unit step function. We choose parameters: N = 18, Fo = 2.9 
itjjT/nm (11.9 pN), a = 1 nm, c = 12 nm, k = 0.09 fc B T/nm 2 (0.37 pN/nm). 

If we write the end-to-end vector rjv — i"i in cylindrical coordinates as (p, <f>, z), the exact probability distribution 
for this vector in equilibrium under constant force F z is given by: 

V GRM (P, 0, z; F ) = A GRM exp (- y ~ $ V Wp 2 + * 2 ) + P F o^j , (S29) 

where Aqrm is a normalization constant. This distribution, projected onto the (p, z) plane, is illustrated in the top 
panel of Fig. 2(a) in the main text. The peak at small z corresponds to the "folded" hairpin state (F) with an intact 
end-point bond, while the peak at larger z is the unfolded (U) state. Integrating Vqrm(p, 0, z; Fq) over p and <p one 
obtains the marginal probability 7 3 grm(z; Fq), 



V GR M(z;F )=A' GKM e ^W^ +m ' z 

2 (3+2a 2 /3fc(iV-l))-3s 2 

2? 




3e-^ 2 ( e -^-^ k +^d^) _j\ z<c (S30) 



z > c 



with normalization constant A' GBM . P G rm(z', Fq) is plotted in the lower panel of Fig. 2(a). 



B. Testing the GRM deconvolution at various forces and trap strengths 



In Fig. 3(b) in the main text we showed that the deconvolution results for the GRM are robust when varying the 
handle parameters. In Fig. S3 we demonstrate that the same conclusion holds when either the force Fq or the trap 
strength fctrap ar e varied. 
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IV. WHAM: COMBINING TRAJECTORIES FROM EXPERIMENTAL RUNS AT DIFFERENT TRAP 

SEPARATIONS 

The weighted histogram analysis method [7] (WHAM) is a powerful tool in analyzing optical tweezer experiments. 
By combining trajectories generated at different trap separations Zt iap (resulting in different force scales F), one can 
sample the full extent of the protein free energy landscape, and use WHAM to construct a single energy profile using 
all the trajectory data, as has been previously done in Ref. [8] (and in a related, but different manner in Ref. [9].) In 
the context of our theory, WHAM modifies Step 1 of our procedure, allowing us to derive the equilibrium probability 
Aot(2tot)-Fb) at constant force F based on information from multiple experimental trajectories. Consider a set of 

(i) 

M experimental runs, where the ith trajectory consists of rij data points and has a trap separation z\'. Except for 

(i) 

zl Iap , all other system parameters are kept the same between runs. For each run one can calculate the normalized 
histogram of total end-to-end distances z to t, yielding a probability distribution V^ t (ztot)- This distribution is related 
to Qtot^tot), the unbiased z to t probability in the absence of a trapping potential or external force, through Eq. (S4). 
Inverting that equation, we can write 

Qtot(^tot) « CrieiM'^-^'pglM 

where Ci is a normalization constant, Ui(z to t) — k z (z[ll p — z to t) 2 /4 and F-i — /? _1 hiCj. In the case of one 
trajectory (M = 1), Eq. (S31) is a way to estimate Qtot(^tot), from which one can calculate Ptot(ztot; Fq) = 
exp( / 3_F 2: tot)Qtot(ztot)- This is just the standard Step 1 procedure described earlier. 

When M > 1, Eq. (S31) provides a different estimate of Qtot(^tot) f° r each i, which ideally should be combined to 
give a single best approximation. The WHAM method resolves this problem, yielding a best estimate for Qtot(^tot) 
of the form: 



Q.u,l :..n) = -l _j :<=1 ^° t ^ ) (S82.) 

3- 



where A is a normalization constant. The unknown parameters {Fi} are given by: 



Fi = --In 

(3 



1 



dz tot Q tot (z tot )e 



-PUi(ztot) 



(S33) 



Eqs. (S32) and (S33) are a coupled system of equations for Qtot(ztot) and {Fi}. We solve these by making an initial 
guess for the set {Fi}, substituting it into Eq. (S32) to find Qtot(^tot); and using this estimate for Q tot (z tot ) in 
Eq. (S33) to find a new set of {Fi}. The process is iterated until we converge to a self-consistent solution to both 
equations. Once we have a best estimate of Qtot(ztot), we can calculate Aot(ztot! Fo) as above, completing Step 1 of 
the construction. 



V. LEUCINE ZIPPER SIMULATIONS 
A. SOP model for the LZ26 leucine zipper 

The amino acid sequence for a single a-helical strand of the LZ26 coiled coil is as follows (grouped into heptad re- 
peats): MCQLEQK VEELLQK NYHLEQE VARLKQL VGELEQK VEELLQK NYHLEQE VARLKQL VGELEQK 
VEELLQK NYHLEQE VARLKQL VGEC. The sequence is the same as in Ref. [10], except that we have left out 
four residues at the beginning and three from the final heptad, for a total of 88 residues per strand. As in the 
experiment [10], the handles are attached at the cysteine in position b of the first heptad, and the cross-linking be- 
tween strands is at the cysteine in position d of the last heptad. (For consistency when comparing simulations with 
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or without the handles/traps, end-to-end distance for the protein is always measured between the two N-terminal 
cysteines.) Although the crystal structure is not available for LZ26, it is believed to be similar to three GCN4 leucine 
zipper domains (PDB ID code 2ZTA) [11] in series. Thus, we constructed a model for the native structure based on 
GCN4, connecting the leucine zipper segments in such a way that the distances between neighboring C a positions 
and angles of superhelical coiling formed a continuous pattern as one moves along LZ26. 

Going from N- to C-terminus on one strand and returning C- to N-terminus on the other, let us label the residues 
i = 1, . . . , N TCS , where N ICS = 176, where 1 < i < 88 corresponds to one strand, and 88 < i < 176 corresponds to the 
other. Every non-neighboring pair of residues, where \i — j\ > 1, is assigned to one of three sets: S (secondary 
structure pairs), T (tertiary structure pairs), and 1Z (remainder). The set S consists of all pairs where = 4 and 

i, j share the same strand, representing residues interacting through a-helical hydrogen bonding. The set T consists 
of all pairs where i, j are on different strands, and the distance between the two residues in the native structure, •, 
is below a cutoff: rf j < R c = 0.8 ran. These pairs are involved in tertiary interactions between the two a-helical 
coils. All other non-neighboring pairs that do not satisfy the criteria for S or T fall into the set 1Z, and only interact 
via repulsive Lennard- Jones potentials. 

The variant SOP Hamiltonian for LZ26 has the form: 

"HSOP = ( r ^+! - r M+l) 2 + "Tp E I 1 ~ COS (0M+l,i+2 - °0)] A i,i+l,i+2 

i—1 i=l 

- £hb t 1 + fc bond( r iJ - r i,j) 2 + fc ang ~ S i,j,j-l) 2 + ~ ^i+l,i,j) 2 

W)es (S34) 

+((f> i+ i,i,j,j-i - 4>i+i,i,j,j-i) 2 )] 

+ E xMiJ) - e s \V L3 ( r ^A + £ e IC J^-)\ 

(»,j)er V JJ / (i,j)en v 4j/ 

The first term is the nearest-neighbor bond potential, where r^j is the distance between residues i and j, and the spring 
constant fcbond = 200 kcal/mol-nm 2 . The second term is the bond angle potential, with the spring constant fc ang = 2 
kcal/mol. The angle between the bonds and (j, k) is 6ij t k, an d the equilibrium value #o = 0.5837T rad = 105°, a 
typical bond angle in protein structures [12]. The factor Aij,k = 1 if i, j, k are all on the same strand, otherwise. 
The relative softness of the bond angle potential, together with the form of the secondary structure interactions 
detailed below, ensure that the two strands in the unfolded LZ26 (with all inter-strand tertiary contacts broken) have 
a persistence length of <~ 0.7 ran, consistent with experimental measurements [10]. 

The third term in Eq. (S34) accounts for the effects of hydrogen bonding along the a-helical backbone, and is 
based on a similar form developed for RNA [13]. We mimic the directionality-dependence of hydrogen bonds by 
making the bond energy depend not only on the distance rjj, but also on bond and dihedral angles defined by the 
four residues i + 1, i, j, and j — 1, with \i — j\ = 4. For each there are two bond angles, ^jj.j-i and 0j+i,» )i7 -, 
and one dihedral angle, 4>i+i,i,jj~i- The equilibrium values of the angles, denoted by a superscript 0, are calculated 
from the corresponding quantities in the native structure. Only when the distance, bond angles, and dihedral angles 
are all simultaneously equal to the equilibrium values does the hydrogen bond potential reach its energy minimum 
~ e hb> where £hb > 0. Thus, the minimum is reached only when the entire [i, i + 4) strand segment adopts a structure 
resembling a single a-helical turn. The a-helical propensity of an (i, i + 4) segment is determined by the energy scale 
£hb and the sensitivity parameters k^ nd , fcang- Larger values for the sensitivity parameters increase the brittleness 
of the a-helix, making it more likely to be destablized due to thermal fluctuations. To calibrate the parameters, we 
define a helix function H(i,j) for any € S, 



m,i) = 



1 




(S35) 
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reflecting the RMS deviation of the bond distances and angles from their equilibrium values. We use H(i,j) as a 
measure of helix content, by counting the fraction of pairs in S where H(i,j) is less than a cutoff H c = 0.5. It is known 
from thermal denaturation experiments on GCN4 [14] that the individual a helices upon unzipping are unstable, with 
w 17% helical content. In contrast, the tertiary contacts in the coiled-coil structure stabilize helix formation, resulting 
in a much higher helical content of w 81%. We expect qualitatively similar behavior in LZ26 in the case of force 
denaturation, and thus tune the sensitivity parameters to yield a large difference in the helix content between the 
unfolded and folded states. The parameter values are set at £hb = 3.85 kcal/mol, fcbond = 10 nm~ 2 , fc^„ g = 40 rad -2 . 
For these values we find a helix content of 3% and 82% respectively for the unfolded and folded states of LZ26 under 
a constant force of F = 12.3 pN. 

The fourth term in Eq. (S34) describes tertiary interactions between the two strands of LZ26, (i, j) € T. These have 
a residue-dependent energy x\ e BT(hj) — £s|- Here x is an overall prefactor, cbt(*, j) is the Betancourt-Thirumalai 
(BT) contact energy for residues i and j [15], and e s shifts the zero of the energy scale [16]. To get a leucine zipper 
that unfolds at the experimental force scale of ~ 12 pN, we choose \ = 2.25 and energy shift e s = 0.7 k B T. The 
tertiary interactions use a modified Lennard- Jones potential of the form: 

VM = K~ =f 6 ~ 1 X l\- (S36) 
I x lz — 2x° x > 1 

This has the standard 12-6 form at large distances, but a softer short-range repulsive core, increasing with the inverse 
6th rather than 12th power. The choice of the softer potential is made to allow for a longer simulation time step, 
while not having a significant impact on the large-scale dynamics of the system [17]. 

The final term in Eq. (S34) describes purely repulsive interactions among the remaining non-neighboring pairs, 
€ 1Z, with energy factor e rop = 1 kcal/mol and range a = 0.38 nm. We use the inverse 6th power in the repulsive 
potential for the same reasons as above. 

B. Semifiexible bead-spring model for the DNA handles 

Each double-stranded DNA handle is modeled as a chain of beads of radius a = 1 nm, corresponding to a 
contour length L = 2aN^. The handle Hamiltonian is: 

, N h -1 , , _ jV h -2 

n h = ^ £ (r M+1 - 2a) 2 + ^ £ [1 - cos(0 M+M+2 )] (S37) 

2—1 i— 1 

where r^j+i are the distances between neighboring beads, fct> nd = ^00 kcal/mol-nm 2 , l p is the persistence length, and 
0j,i+i,i+2 are angles between consecutive bonds. The two terms are stretching and bending energies respectively. The 
handle elastic modulus 7 = 2a/cbond = 2780 pN. For the persistence length we consider, l p = 20 nm, at the applied 
tension due to the traps, the handles (and unfolded portions of the protein) are almost fully extended, and there is 
negligible probability of the chain overlapping itself or protein residues in the vicinity of the handle attachment point. 
Hence, there is no need to include excluded volume interactions for the handles. The covalcnt linkers that attach the 
handles either to the cysteine residue at the protein N-terminus or a point on the bead surface are modeled as simple 
harmonic springs with strength n — fcbond and length I — 1.5 nm. 

C. Simulation time scales 

Let jLto = 1/Qirrja be the mobility of a sphere of radius a = 1 nm, where r\ — 0.89 mPa-s is the viscosity of 
water at T — 298 K. This will be the mobility of our DNA handle beads, while for the large polystyrene beads 
the corresponding mobility is ^ = ^oa/Rb- The rotational diffusion of the polystyrene bead is characterized by 
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a mobility /x£ — i^ a a/AR\. For the protein residues we choose a mobility /i rcs = 3.36^0, corresponding to an 
effective hydrodynamic radius of 0.30 nm. The characteristic Brownian dynamics time scale associated with no is 
To = a 2 / hqUbT = 4 ns. To avoid numerical errors, our simulation time step t should be a small fraction of To, and 
we obtained reliable results using r = 5 x 10~ 6 t = 0.02 ps. For LZ26 both with and without handles/beads, we 
ran ps 260 long trajectories at various force conditions (or trap separations), totaling to w 10 12 simulation time steps, 
or 20 ms, with data collection every 10 4 steps. (In the case of the simulations involving the GRM hairpin instead 
of the protein, the time step t = 3 x 10~ 4 to = 1.2 ps, and the total trajectory data for each GRM parameter set 
corresponded to 160 — 180 ms.) 

VI. FINITE BANDWIDTH SCALING (FBS): CORRECTING FOR THE EFFECTS OF ELECTRONIC 

FILTERING, TIME AVERAGING, AND NOISE 

Before the data from optical tweezer experiments can be used to reconstruct the intrinsic biomolecule free energy 
landscape, one must consider the inevitable distortions due to noise, the electronic systems involved in data recording, 
and any additional filtering done as part of the collection protocol. We have developed a method, finite bandwidth 
scaling (FBS), to correct for these distortions. In the following we first derive the basic FBS scaling relations, and 
then verify them using both simulation and experimental data sets. 



Understanding how the time series of bead positions is distorted as part of the measurement process requires a 
detailed spectral analysis of all components in the dual optical tweezer apparatus. The spectral properties of the 
experimental system used to collect the data in our work have been extensively characterized by von Hansen et. 
al. [18] , allowing us to develop a simplified theory which approximates the most important sources of distortion. Our 
theory fits all the experimental data sets under consideration, but it can be easily modified to include additional 
complications that we ignore (for example crosstalk between the two laser traps) as well as the details of other 
experimental setups. 

Let z\™{t) be the trajectory of bead-bead separations along the z-axis recorded during the experiment. This raw 
data set is based on the signal from the silicon photodiode devices that measure the deflection of the lasers due to 
bead displacements. This output is then processed and amplified by the electronic system used in the recording 
apparatus. If z tot {t) is the actual trajectory of bead displacements, inaccessible to the experimentalist, the recorded 
output z\^{t) is related to Ztot(*) as: 



The deviation of z\^{t) from z to t{t) stems from two main effects: (i) an additive noise component f](t), which includes 
environmental noise like vibrations of the optical elements in the apparatus and electronic noise in the detectors [18]. 
For simplicity, we model the noise as Gaussian white noise with zero mean and variance equal to v. (rj(t)) = 0, 
{rj{t)rj(t')) = u8{t — t'), and (r?(t)ztot(i')) = 0; where ( ) denotes an equilibrium ensemble average; (ii) convolution 
with a kernel function f(t), which reflects the filtering properties of the photodiodes and electronics. Any additional 
time averaging or filtering carried out by the experimentalist on the recorded data series will be considered explicitly 
later on, and is not included in f(t). The analysis of Rcf. [18] yielded the following form for the filter kernel in the 
frequency domain, 



A. FBS theory 




(S38) 



f(w)= A + 



1-A 1 



(S39) 



1 - iCJTp/J B 8 (iujT bf ) 
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where for our LOT setup A = 0.6±0.05, r p / = 6±1 [is, Tf,f — 5 /is, and B s (x) is the 8th order Butterworth polynomial. 
The term in the square brackets above originates in a physical phenomenon known as "parasitic filtering" [19, 20], 
arising from the transparency of the silicon in the photodiode to the laser light with wavelength 1064 nm used in the 
experiment: a fraction 1 — A of the photocurrent from the detector is produced with a lag time r p f relative to the 
photon signal. The second term in Eq. (S39), involving the Butterworth polynomial, is due to the subsequent electronic 
amplification of the signal from the detector, which acts like a Butterworth lowpass filter with characteristic timescale 
Tbf, such that at the frequency lo = the signal amplitude is attenuated by 3 dB. Since the form of Eq. (S39) 
is too complicated for use in our analytical theory, we will approximate f(t) as a generic first-order low-pass filter, 
exploiting the fact that both the parasitic and electronic terms act to attenuate high-frequency portions of the signal, 

5 , (S40) 

1 - VjJTf 

where r/ = 7 ixs. This effective filtering timescale r/ is derived by demanding that Eq. (S40) exhibit the same degree 
of attenuation at u> = t^ 1 as Eq. (S39). 

Though these distortions are expressed in the frequency domain, they have observable consequences for the equi- 
librium probability distribution of bead-bead separations. As an example, consider the raw autocorrelation function 
C r aw(i) = ((zloT(t) — zloT)( z toT( Q ) ~ ^toD)) where z\l™ is the mean recorded bead-bead separation. The variance of 
the raw probability distribution ^toTl^toD i s equal to C raw (0). From Eqs. (S38) and (S40), the raw autocorrelation 
is related to the true one, C(t) = ((z tot (t) - z tot )(z tot (0) - z tot )), by: 

v r°° e -\ t - t '\/ T t 

C raw (t) = T—e-^ + / dt' e — C(t'). (S41) 

2r f ./-co 2t/ 



The first term in Eq. (S41), due to noise, tends to increase the variance C raw (0) relative to C(0). The second term, 
due to filtering, is always less than C(0), since it is an average over C(t'), and C(t' ^ 0) < C(0). Noise broadens 
the measured distribution, and filtering narrows it. However without knowing the amplitude of the noise v, it is 
unclear whether the filtering due to the detectors and electronics under- or over-compensates for the noise, and how 
far ^tot^totT) deviates from the true distribution T-tot^tot)- Thus, we need a way to estimate v. 

The situation is even more complicated since the experimentalist may choose to apply additional filtering on the 
recorded data, for example as a way of manually removing noise and unwanted high frequency components of the signal 
(since the dynamics of interest typically occur at frequencies much lower than imposed filter cutoff). For the GCN4 
leucine zipper, the data sets recorded at 100 kHz (corresponding to a sampling time step t s — 10 /is) were subsequently 
filtered in real time during collection by averaging every 5 consecutive time steps together. Such averaging acts like 
a low-pass filter, and so has narrowing effects on the equilibrium probability distribution qualitatively similar to 
the filtering described above. Some type of additional filtering of this kind is a common experimental practice (see 
Refs. [21], [22], and [23] for recent examples, involving either an averaging or 8 pole Bcsscl filter). It turns out, however, 
that we can take advantage of the filtering protocol: by varying the degree of filtering we will use it to approximately 
extrapolate features of the true probability distribution. 

Let us concentrate on the simple case of filtering the recorded data by averaging every n consecutive points into 
a single value. If the collection time step is r s , the original raw data is represented by the recorded time series 
{z™™(tj)}, where tj = jr s for j = 1,2,3, . . .. The averaged data is a time series {z T t ™' n (t n j)}, where z^' n (t n j) = 
rT 1 Y^l=nj-n+i z lot{ti)- F° r the averaged time series we will focus on two quantities, both related directly to its 
autocorrelation C raWi „(i): the variance {{z\^ ,n — z\^' n ) 2 ) = C raWj „(0), and the mean-squared displacement (MSD) 
between consecutive points, ((z™'"^) ~ z to7' n (0)) 2 ) = 2(C raW! „(0) - C raw ,„(nT s )) = A law , n (riT s ). In a more 
complicated fashion, these two quantities can also be expressed in terms of the original autocorrelation C raw (i) = 
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C raw ,i(t) before averaging: 

,,- i 

n """ 1 ' n z 



1 2 ™ _1 

C raw ,n(0) = -C raw (0) + y2(n- j)C Iaw (jT s ), 



° 1 _ (S42) 

2 2 n 2 n 
A raw „(nr s ) = -C raw (0) + V(2n - 3j)C raw (jr s ) V(n - j)C raw ((n + j)r s ). 

We know that C raw (t) is related to the unknown true correlation C{t) through Eq. (S41), so we can complete the 
theoretical description by specifying a form for C(t). A generic correlation function can be expanded as a sum of 
exponentials, C(t) = A.% exp(— t/n), with relaxation times t\ < t 2 < • • • . We will be interested in correlations 

on the shortest accessible time-scales, t <~ 0(t s ), so we plug the expression for C(t) into Eq. (S41) and expand for 
small t, keeping the contribution from the n exponential and lowest order corrections from the Tj>i terms: 

oo . 

f j= i • / - (g 43 ) 

- ^ t,T1 + ^2 {^- t/Tl - r f e^) +A C - B& + t/e"*^), 

/ If 

where A c = X)i^2 Ai, B c = A 2 /t 2 . If necessary, the expansion can be extended to higher orders, but the above form 
was sufficient to fit all the simulation and experimental cases which we analyze below. 

Eqs. (S42)-(S43) completely define the variance C raWi „(0) and MSD A raWi „(nr s ) in terms of five unknown param- 
eters: v, Ai, n, A c , and B c . By averaging the recorded time series {z\^{tj)} for different values of n (varying the 
effective filter bandwidth), we construct curves of C raW)n (0) and MSD A raWj „(nT s ) as a function of n. Fitting these 
curves to Eqs. (S42)-(S43), we can then extract the unknown parameters. This allows us to estimate the true variance 
of the probability distribution, 

C(0) = Ai+ Ac. (S44) 

Since we are using properties of time series at different effective bandwidths to gain information about the true, 
"infinite" bandwidth limit, we call our method finite bandwidth scaling (FBS). The analogy is to finite size scaling [24], 
where thermodynamic properties of systems on finite lattices are extrapolated to the infinite lattice limit. One of 
the nice features of FBS is that the scaling analysis can be carried out even when we can only calculate C raWi „(0) 
and A raWi „(riT s ) for a subset of n values. For example, in the leucine zipper case below, the available time series 
corresponds to n = 5, since the data was time averaged during collection. From the n = 5 data we can construct 

trajectories for n — 10, 15, 20, This subset is sufficient for the FBS extrapolation. 

Once we know C(0), how can we use it to approximately reconstruct the true distribution "Ptot(ztot)? Keep in mind 
that the variance C(0) = (z% ot ) — (z tot ) 2 — J dz (z — (z tot )) 2 Vtot( z )- The simplest estimate for 7 3 tot(ztot) is to start 
with the measured, averaged distribution Vl^' n for some n, and deform it in one of two ways, changing its variance 
by an amount 8c = |C(0) — C raWj „(0)|: (i) If C raWi „(0) < C(0), we carry out a convolution with a normalized Gaussian 
of variance 8c , 

roo p -z 2 /2S c 

Ptot(*at) « J _ dzVi:r(z tot - z)-j=. (S45) 
(ii) If C raW!n (0) > C(0), we do a deconvolution instead, solving 

rco p ~z 2 /26 c 

n:r( Ztot )^ d*^^ (S46) 

for Vtot- The latter can be carried out using the numerical deconvolution technique described in Sec. IF. After the 
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deformation, the estimated 7-tot will by construction have the correct variance C(0). We should recover roughly the 
same Ptot starting from V^' 71 for any n in the range where the FBS scaling is valid, as we will demonstrate in the 
examples below. In systems with multiple states, where there is more than one peak in the measured distribution, 
it is more accurate to carry out the FBS analysis separately on each state, and apply the corresponding specific 
deformation for each peak. This can be done with the aid of hidden Markov model [25] partitioning of the time series, 
as described in the next section for the case of the GRM and leucine zipper. 

The FBS method has several limitations: (i) using a Gaussian to deform Vl^' n into 7-tot is an assumption, since all 
we strictly know about the actual point-spread function is the variance 5c- The smaller the variance, the more valid 
the assumption, since the potential non-Gaussian contributions to the point-spread function become less significant. 
We can also test the assumption from our measured data, by checking whether the V[™ ,n for various n can be 
mapped to each other by Gaussian deformations. For all the systems analyzed in our work this is indeed the case, 
(ii) Gaussian deformations map individual peaks into slightly broader or narrower peaks, but do not produce new 
peaks. Hence, if there is a state with a very short lifetime that is smeared out by the filtering (either the parasitic, 
electronic, or additional filtering), yielding no distinct peak in V\^ ,n , the FBS method will not be able to reconstruct 
its properties. Whether or not the experimentalist chooses to do additional averaging, the intrinsic time resolution 
Tf of the apparatus puts fundamental constraints on what we can learn from the measured time series. Transitions 
occurring on timescales faster than r/ will be lost to us. (iii) In a similar way, the characteristic relaxation times 
of the trapped beads also impose limits. To illustrate this, take two protein states Si and 5*2, which have a small 
difference in their mean end-to-end distance along the force direction, and assume S\ is only accessible from S 2 - If 
the mean lifetime of Si is much smaller than S 2 > such that it is shorter than the bead relaxation time, transitions like 
S*2 — > Si — > S2 will correspond to only negligible excursions in the measured trajectory of bead displacements, since 
the beads do not have enough time to relax to the equilibrium position associated with Si before the protein returns 
to S 2 - If the mean-squared distance of the excursions is smaller than the noise amplitude in the recorded time series, 
the existence of state Si will be hidden from the experimentalist, regardless of the apparatus filtering timescale 77. 
In summary, the distribution produced by FBS is an approximation to the truth: the method can correct distortions 
produced by noise and filtering, but it only works for states in the energy landscape which leave some signature of 
themselves in the measured time series. 



TABLE S2. Parameters used in the FBS analysis of simulation and experimental systems 



T s Tf V 


Ai 




A c 


B c 


[fis] {(is] [nm 2 ^is] 


[nm 2 ] 


H 


[nm 2 ] 


[nm 2 //is] 


Simulation: GRM 


State N 0.024 


2.2 ± 1.2 


3.2 ± 1.0 


3.1 ±1.2 


0.24 ±0.15 


State U 0.024 


1.6 ±0.2 


2.1 ± 0.2 


2.6 ±0.2 


0.22 ±0.04 


Experiment: dsDNA handles (no protein) 


F = 9.4pN 10 7 31.8 ±1.5 


5.0 ±0.2 


14.7 ± 1.8 


3.2 ±0.2 


0.0077 ±0.0020 


F Q = 11.5 pN 10 7 31.8 ±1.5 


4.0 ±0.2 


12.6 ± 1.4 


3.0 ±0.1 


0.0079 ±0.0012 


F = 12.7pN 10 7 31.8 ±1.5 


3.3 ±0.2 


11.5 ± 1.2 


2.9 ±0.1 


0.0076 ± 0.0008 


Experiment: GCN4 leucine zipper (trajectory 1) 


State 11 10 7 31.8 ±1.5 


3.9 ±0.2 


28.1 ±5.0 


3.5 ±0.2 


0.0066 ±0.0010 


State 12 10 7 31.8 ±1.5 


8.6 ±0.6 


41.9 ±8.0 


4.2 ±0.8 


0.0069 ± 0.0036 


State U 10 7 31.8 ±1.5 


7.6 ±0.1 


23.7 ± 1.7 


3.2 ±0.1 


0.0065 ± 0.0007 


Experiment: GCN4 leucine zipper (trajectory 2) 


State 11 10 7 31.8 ±1.5 


4.1 ±0.1 


28.1 ±3.5 


3.5 ±0.2 


0.0069 ± 0.0008 


State 12 10 7 31.8 ±1.5 


8.3 ± 1.0 


41.3 ± 12.0 


5.3 ± 1.3 


0.0074 ±0.0053 


State U 10 7 31.8 ±1.5 


8.1 ±0.2 


23.6 ±2.2 


3.2 ±0.2 


0.0066 ± 0.0009 
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FIG. S4. FBS analysis of a GRM Brownian dynamics simulation (ztrap = 1298 nm; all other parameters as in Table SI), (a) 
The probability distribution of the bead-bead separation from the raw simulation data, VloT (gray), and the decomposition 
into individual Gaussian peaks corresponding to the N (blue) and U (red) states. Distances are measured with respect to z, the 
mean bead-bead separation, (b) A sample time series fragment from the simulation, with the individual data points colored 
according to their assignment to the N (blue) and U (red) states by hidden Markov model analysis, (c) For the raw time series 
filtered by averaging together every n data points, the variance C raw ,n(0) as a function of n. The time series corresponding 
to each state, N and U, is analyzed separately, and plotted as blue and red points respectively, with bars denoting standard 
error due to finite sampling. The solid curves are the FBS theoretical fits to C r aw,ti(0) [Eq. (S42)]. Best-fit FBS parameters are 
listed in Table S2. (d) Analogous to (c), except showing the MSD function A raWi „(nr s ) for consecutive pairs of points in the 
averaged time series, (e) The raw distributions V^'" f° r the- averaged time series at n = 1, 50, 100, 150. (f) Solid curves: the 
distribution 7-tot estimated by applying the appropriate FBS correction to the raw distributions in (e). There are four curves, 
but due to overlapping they appear as one. Points: the raw distribution Vl^' n for n = 1 (no averaging), which for the GRM 
case is the true distribution, since there are no noise or apparatus filtering effects in the simulation. 



B. Testing FBS on simulation and experimental data 



As a first test of the FBS theory, we analyze a Brownian dynamics simulation trajectory of the GRM model in 
an optical tweezer setup (Fig. S4). The trap separation z trap = 1298 nm, and all the other parameters are listed in 
Table SI. A computer simulation has perfect recording of data, with no environmental noise or apparatus filtering 
effects, hence it can test the FBS theory of Eqs. (S42)-(S43) in the limit v = Tt — 0. In this case the true distribution 
is just the n = 1 raw distribution Vg? _ -p™^\ plotted in Fig. S4(a) ( gray curve). If the FBS scaling is valid, we 
should be able to map any distribution for n > 1 onto the n = 1 result by applying the FBS correction procedure 
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described above. 

The GRM model exhibits two states, native N and unfolded U, which have distinct dynamical properties. Hence 
it is more accurate to apply the FBS method separately to just those portions of the time series belonging to each 
state. Partitioning the time series by state requires estimating the most likely sequence of states that corresponds to 
the data. Hidden Markov modeling (HMM) [25] is a general tool for this task. The probability distribution can be 
accurately decomposed into Gaussians corresponding to each state, as depicted in Fig. S4(a), which define likelihoods 
for any observed z to t data point in the trajectory to belong to one or the other state. We then process the entire 
trajectory through the Baum- Welch algorithm [26], to find optimal values for the unknown transition probabilities 
between states, and finally construct the most likely state sequence using the Viterbi algorithm [27]. Fig. S4(b) shows 
a fragment of the trajectory, colored according to the state assignment resulting from HMM. 

The variance C raWj „(0) and MSD A raWj „(nT s ) are then calculated as a function of n from the trajectory fragments 
belonging to a certain state. For a given n, the calculation involves averaging over data points within a time window 
up to 2nr s in length, so getting good statistics requires having many fragments longer than 2nr s . This will be true 
so long as 2nr s is much smaller than the mean lifetime of the state, putting a practical upper bound on n. Fig. S4(c) 
and (d) plot the results for C raWi „(0) and A raWj „(nT s ) respectively (blue points: N state; red points: U state). Bars 
represent statistical standard error due to finite sample size, as determined through jackknife estimation [28]. The 
solid curves are fits to Eqs. (S42)-(S43), from which we extract the FBS parameter values listed in Table S2. 

With these values in hand, we can carry out the correction procedure: Fig. S4(d) shows the raw distributions Pl™' n 
for n = 1,50,100,150 (solid curves), and panel (e) shows that same distributions after they have been corrected 
according to the method outlined above (n = 1 needs no correction, but is included for comparison). The greater 
the degree of averaging (increasing n), the narrower the peaks in Vl^' n . However, the FBS method compensates for 
this, and all the distributions in (e) have collapsed onto a single estimate for the true Vtot- As expected, this estimate 
agrees very well with the n = 1 result P™™ (cyan points). 

The second test of the FBS theory is on experimental data for a system with only dsDNA handles and beads, 
discussed in Sec. II. FBS results for three different trajectories (corresponding to three values of the mean force F ) 
are presented in Fig. S5. These data sets were recorded with a sampling rate of 100 kHz (r s = 10 /xs), with no 
additional averaging beyond the unavoidable filtering effects of the detectors and electronics. As a consequence of 
these effects, Vl™' 1 is not the same as the true distribution, and the deviation grows larger as n is increased. The 
FBS best-fit results are shown in Table S2. In the fitting the noise amplitude v is constrained to be the same among 
all three trajectories, since they are all collected on the same equipment. Like in the previous example, Vl^' n for 
various n can all be collapsed onto a single estimate for Vtot through the FBS method. In Fig. S5(e) this estimate 
(solid curves) is compared to Vl^' 1 (dashed curves), to emphasize that the distortions due to apparatus filtering are 
small but noticeable. 

The final test is on the GCN4 leucine zipper experimental time series (trajectory 1, with parameters described 
in Table SI). As mentioned earlier, here we only can construct averaged data sets for n = 5, 10, 15, . . .: the n = 1 
trajectory, at the original r s = 10 /US sampling interval, is not available. Despite this limitation, the FBS scaling 
analysis works nicely, with results summarized in Fig. S6 and Table S2. We took advantage of the fact that the 
handle-only data sets, collected with the same optical tweezer setup as the leucine zipper (except with no protein), 
had direct information about n = 1 timescales, and thus probed higher frequencies than were accessible in the leucine 
zipper data. Since going to higher frequencies gives us better estimates of the background noise, we set the noise 
amplitude v in the leucine zipper case to the best-fit value from the handle-only analysis. All other FBS parameters 
were fit individually for each state (II, 12, and U) in the leucine zipper distribution. With FBS corrections, V™ ,n 
for n going up to 25 (effective bandwidths as low as 4 kHz) all collapse onto a single estimate of the true Vtot- 
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FIG. S5. FBS analysis of experimental results for a system containing only dsDNA handles and beads (see Sec. II for apparatus 
parameters). We analyze three separate trajectories at different constant trap separations, corresponding to mean forces 
F — 9.4 ± 0.7, 11.5 ± 0.8, and 12.7 ± 0.9 pN. The results in each panel are labeled by the F value of the trajectory, (a) For 
the raw experimental time series filtered by averaging together every n data points, the variance C ra w,n(0) as a function of 
n. The results are plotted as points, with bars denoting standard error due to finite sampling. The solid curves are the FBS 
theoretical fits to C ra w,n(0) [Eq. (S42)]. Best-fit FBS parameters are listed in Table S2. (b) Analogous to (a), except showing 
the MSD function A raWjn (nr s ) for consecutive pairs of points in the averaged time series, (c) The raw distributions PtoT'" Ior 
the averaged time series at n = 1, 2, 5, 10. Each row corresponds to a different trajectory, (d) Solid curves: the distribution Vtot 
estimated by applying the appropriate FBS correction to the raw distributions in (c). For each trajectory there are four curves, 
but due to overlapping they appear as one. Dashed curves: the raw distribution V^t''" f° r n — 1. Though this distribution is 
free of any additional time averaging carried out on the recorded time series, it is subject to parasitic and electronic filtering 
effects intrinsic to the apparatus. These distortions are corrected by FBS, and hence the dashed and solid curves are distinct. 



VII. ESTIMATING UNCERTAINTIES IN THE FREE ENERGY RECONSTRUCTION 

The free energy reconstruction is only as good as the data on which it is based: the recorded time scries which is 
the input, and the information about the apparatus which is used to analyze the time series and predict the intrinsic 
landscape. Both of these are subject to uncertainties, which will propagate into the final result. Let us first consider 
statistical uncertainties due to the finite length of the trajectories from which the input probability distribution V^™ 
is determined. One of the advantages of the double optical trap setup is that it is exceptionally stable, allowing for 
data collection over periods > 100 s. In the case of the leucine zipper, the slowest transition (from U to 12) occurs 
on timescales of 0.4 — 0.6 s in the force range of interest, so even a single trajectory contains ~ O(10 2 ) of the rarest 
observed conformational changes. 

Thus the distribution VI™ has small statistical uncertainties. To quantitatively estimate the error, we use a block 
bootstrap method [29, 30] in the following manner: the trajectory is divided into blocks of length larger than the 
longest autocorrelation time, and a synthetic data set of the same length is generated by sampling with replacement 
from this set of blocks. Using a large number of synthetic data sets (> 500) we can determine confidence intervals for 
each point in the V{™ distribution. The number of blocks is varied until convergence is achieved in the error estimate. 
The results are shown in Fig. S7(a-b) for two leucine zipper experimental trajectories (parameters as in Table SI). 
The VI™ distributions (black curves) are surrounded by dark blue bands which represent the 68% confidence interval, 
or standard error margin. The median error in the z range where V™ > 10 -6 , is 10% and 19% respectively for the 
two trajectories. 

In reconstructing the intrinsic free energy landscape J- p , this statistical error is compounded by uncertainties in 
all the apparatus parameters that are used in the analysis: bead radii, trap strengths, handle properties, as listed in 
Table SI, as well as uncertainties in the FBS parameters used to correct the raw distributions (Table S2). We perform 
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FIG. S6. FBS analysis of a GCN4 leucine zipper experiment (trajectory 1, with parameters given in Table SI), (a) The 
probability distribution of the bead-bead separation from the raw experimental data, ViSt (g ra y)> anCl the decomposition into 
individual Gaussian peaks corresponding to the II (red), 12 (green), and U (blue) states. Distances are measured with respect 
to zn, the position of the II peak, (b) A sample time series fragment from the experiment, with the individual data points 
colored according to their assignment to the II (red), 12 (green), and U (blue) states by hidden Markov model analysis, (c) 
For the raw time series filtered by averaging together every n data points, the variance C ra w,n(0) as a function of n. The time 
series corresponding to each state is analyzed separately, and plotted as points in distinct colors, with bars denoting standard 
error due to finite sampling. The solid curves are the FBS theoretical fits to Cr aw ,ii(0) [Eq. (S42)]. Best-fit FBS parameters are 
listed in Table S2. (d) Analogous to (c), except showing the MSD function A raWi „(nr s ) for consecutive pairs of points in the 
averaged time series, (e) The raw distributions Vl™' n for the averaged time series at n — 5, 10, 15, 20, 25. (f) Solid curves: the 
distribution Vtot estimated by applying the appropriate FBS correction to the raw distributions in (e). There are five curves, 
but due to overlapping they appear as one. Dashed curve: the raw distribution V™™'™ for n — 5. 



a Monte Carlo error estimate, by sampling from Gaussian distributions of these parameters with standard deviations 
given by the uncertainties, and for each parameter set performing the complete free energy reconstruction on the entire 
ensemble of synthetic data sets generated by the block bootstrap. In order to analyze the shape differences among the 
reconstructed landscapes, every T p is projected to the mid-point value of Fq where the probabilities of states II and 
U are equal. (Fq = 12.3 ± 0.9 pN and 12.1 ± 0.9 pN from trajectories 1 and 2 respectively.) Though computationally 
intensive, this procedure allows us to estimate 68% confidence intervals for J-p shown as light blue bands for the two 
trajectories in Fig. S7(c-d). For comparison, if one assumed no uncertainty in the apparatus parameters, one would 
get the much narrower dark blue bands, representing just the error in J- p from the finite sampling of . Clearly, 
the uncertainties in the apparatus parameters are the predominant source of error. 
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FIG. S7. Estimation of uncertainty in the free energy reconstruction, as discussed in Sec. VII. (a-b) Probability distributions 
PtoiT °f the raw time series for total bead-bead separation (black curves) collected during two experimental runs (left/right 
columns; see Table SI for parameters). Distances are measured with respect to zn, the position of the II peak. The dark blue 
band corresponds to the standard error (68% confidence interval) for each point in the distribution, due to finite sampling, 
(c-d) The corresponding intrinsic protein free energy T v (black curves), as calculated using the procedure described in the main 
text. The free energies are in the constant force ensemble, at the mid-point force value Fo where the probability of being in 
states II and U is equal (Fo — 12.3 ± 0.9 pN for run 1, 12.1 ± 0.9 pN for run 2). The dark blue band represents standard error 
(68% confidence interval) including just the uncertainty due to finite sampling; the wider light blue band is the standard error 
including all sources of uncertainty (sampling and apparatus parameters). 



With both apparatus and sampling uncertainties included, the median standard error over the z range where 
T p < — fcsTln(10 -6 ) « 14fceT is 10% in both trajectories. This corresponds to ~ OAksT deviations in the shape of 
the landscape. The median difference between J-p estimated from the two trajectories in this range is 0.3fcsT, and 
hence our free energy analysis gives a consistent result, within standard error, between the two different experimental 
runs. 
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